"From an OSM Way to a Highway Shield" by Arielle Simmons-Steffen and Matt Kenny Live captioning by Norma Miller. @whitecoatcapxg Let's welcome Arielle and Matt. [applause] Good morning. So like she said, my name is Ari and I'm here with my colleague, Matt, we're both GIS engineers at Tableau. First off, does anybody know what Tableau is? Show of hands? Put your hand down if you're from Seattle. [laughter] Well, I'm still impressed. So Tableau low specialized in data visualization, one of the features of that product is a map, so our customers can display their data on a map, a slippy map basically and we also have a geo coding product and Matt and I also work quite intimately with these two products. How do we use OpenStreetMap? OpenStreetMap mostly comes into play with our product from the base map. We use it to visualize features such as roads and water. This is an example of a rather simple Tableau vis that colleague and I baked a few weeks ago with Brexit vote. If we had the internet, I could show you guys that we actually filter according these of and it gives that interactive mapping ability to our customers. Another feature, this is obviously Nepal. This is something I did right after the Nepal earthquake. There's something like 200,000 ways in here that I queried in overpass from OSM and on this map our customers can filter, we can, you know, hover over a road, learn more about it, the metadata that I queried out of overpass, so yeah, it's a pretty cool tool and it, you for example this is how our customers use it. So why am I here? Today I want to tell you how we got road shields out of OSM ways. You know, it seems like a rather, oh, well, this sounds simple, but it's complicated by the fact that we do have this product that we envision our customers putting their data on the map, so the main point of the visualization isn't actually the base map, it's actually their data, so we don't want the map to be too cluttered. Specifically with a lot of road shields, which would happen if we told OSM, hey, for every way, give me a road shield, that would be pretty deps. So we really are aiming for, as you can see here in Seattle, this very simple view. And we also needed a set that was comprehensive. We also needed different rendering styles for both the United States and global shields, so for the United States, we wanted to be able to show graphically the difference between interstates and state highways and stuff like that and we also wanted to be able to adjust the label density by zoom level, which made it very necessary, we found out to have points, instead of lines. So essentially we needed to go from OSM way, which was a line to a point. So I'm going to happened this over to Matt so he can tell you more about the data extraction. >> Matt: Cool, thanks, Ari, yeah, so as Ari mentioned, the basic problem is lines to points, right? And so to actually load the OSM planet file into a Postgres database, we used the imPOSM tool. This generates a series of tables that are separated by logical groupings of themes, like motorways, land use areas, etc. and for our purposes we looked at data that were inside of the OSM motorways table and looking at this table we wanted to extract out just a subset of linear features that represented data that we'd actually want to generate a shield point from, right? And so we found in a there were a few tags of interest, specific lick like highway equals motorway trunk and then the link or ramp features for both of those kinds of tabs. So once we identified which datasets we wanted to subset, which tags we wanted to subset, we went ahead and extracted them out as shape files. But the challenge that we ran into was during our initial load using imPOSM, we actually dropped the numerical indicator for an actual surface here. It has a new America cal identifier of WA Route 99 and what we realized is that in our initial load of the OSM planet file, we actually were only bringing in the nametag for that particular way, which was the more colloquial name, Aurora avenue north, and so kind of looking at the data, we realized, oh, well, we need these data in the ref tag, so we modified a mapping file that's part of the imPOSM package that's designed to essentially subtract out a particular subset of tags from the OSM planet file and convert them into these resultant Postgres tables, and once we modified that to improve the ref tag, we reimported the planet and were able to have these roadways that contained both the name segment which we wanted the text label for, as well as the data that we wanted to extract for the road shield, and so Ari is going to take it over and talk about how we actually process those linear features. >> ARI: Thanks, Matt, so now tip, we have lines, good start. And our goal is it to still get a shape file points that can be used to generate these at different levels and we kind of need style rules. We're using an open source rendering service called MapServer. And we are going to use a font file to actually design the shields. So step one is taking this shape file, a rather large global shape file of lines, and segmenting out what is the United States and what is the global, so we just did spatial select, hey you're in the United States, great, and then we also had, you know, everybody else who is the globe, and then we ran it through two Python scripts. One was shield labels.Py, we also have added these scripts into GitHub so you can use them and play with them if you like and you can notice by the dependencies, we're a pretty open source-friendly company, so we used mostly open source tools to build these scripts. So first script that we ran it through was shield labels.py, the main job of this script was to curate the label that needed to be shown on top of the shield. It required the input of the line string shape file. It needed to contain an attribute called ref. As Matt mentioned earlier, we had to modify the imPOSM file. And also that spatial select that I mentioned. And it's a longer script, but just to give you an overview of what it did, the first step was dissolve, why did we dissolve? We needed to dissolve, because as many of you guys who have edited OpenStreetMaps know you aren't creating all one long highway 5 all the way to California. It's you and 5 million users, so we needed to dissolve into segment. We also needed to split and duplicate the segments. I'll get into that a little more. We needed to designate what type of shield was it, and lastly we needed to create the label and do some string edits on it. So discovering and duplicating the segments. Why did we have to split and duplicate the segments? well, this happened because many of the things that we enter into OpenStreetMap are condensed into one line but two labels. What we really needed was two lines with one label apiece. So as you can see in this ref tag there's two highways represented by this one line. I needed to break these apart, duplicate them and have one line per segment. I also needed to designate the shield type. Is it an interstate? Is it a US highway? And you know, that was done by basic reg ex and string searches. There's nothing look liking at a piece of regex that you wrote years ago and it's been too long since you've had coffee. This is fabulous piece of text. It's removing any preceding letters, and if there are letters after the numbers, which are, you know, a space or a dash, because sometimes you have 99-A or 67 -- I'm making up stuff, but 67-B. We also needed to pick up that last letter. Now we have a new label set that's beens, you know processed and it has these D-duplicated lines and it's dissolved. It's a lot cleaner and now we can run it through nodify.py and these were the requirements that we needed to go into that script. It needed to have the label that we created with all that regex, the shield type is a US highway interstate. It was also dissolved and split. Nodify.py has two functions. The first one is it at a bunch of modes and the second one goes back and deletes a bunch of them. We didn't go into this process knowing off the top of our head that zoom 5 is about the extent of the continental US needs to look this many shields to look not dense, so we really needed kind of an iterative process to kind of go through and explore how many shields look right for Tableau, and eventually we settled on this. This many points within this radius, can you only have one of that label type. And that's what the nodes is -- it just went through and it's like ooh, you're the same label, you're too close, and it deletes it. So this is literally what it looked like undissolving, didn't really do a lot of processing on the lines, just said show me a label and as you can see, it's pretty dense. There's a lot of scarcity of shield labels up here in this corner. There's a ton that's just the same thing and are really close together and after we ran it through this process, this is what it looks like. We note have shields up here that weren't shown before because we dissolved into longer segments, you know, we could pick them up in the measurement, they weren't being read as these short little segments that probably didn't have a shield. They were actually longer roads that could have shields. We also eliminated enough of the duplicate labels that we were picking up new labels in the back. Oh, good, I know how to work this laptop. [laughter] So you can kind of see the difference. And all of this is really good stuff, because when you do these type of edits in the back end you're not asking front-end label rendering closure system to do the system. So it runs both fast and accurate. The final process is styling the output and like I said, we use MapServer for our rendering, and some designers that work at Tableau and work very hard Tableau helped design this font file, which basically is a bunch of stacked fonts you can see. There's you know, that little I don't know what they call it the bracket on the top of the interstate feel, so we literally were stacking these different size fonts on top of each other to produce that graphic that you see on the left. And it was, know, a pretty nice system. So we didn't have the case of your label is overrunning your shield, which never looks great. Lastly like I mentioned we're making our code publicly available. Feel free to have fun with it. That's where it is. Matt? >> Matt: Yeah, so like Ari said kind of thinking about concluding thoughts, through the development of these scripts, we kind of left with the ability to quickly iterate on these differing node values, spacing values, to actually create these points, right? And so that process became fast. The process that was still actually somewhat time consuming was kind of the research that it took to suss through the data and identify what, you know, particular types of regular expressions were required, to disambiguate, you know, a US interstate from a state route or something like that. So you'll notice that in your dataset we actually have just a US-specific set of shields that are subset into interstate, state route, etc., and then a global set of shields, right? And so kind of where I think it would be interesting is to -- you know, there's the opportunity for collaboration to get research from other people into what does it take to disambiguate roads from New Zealand or roads in Japan, for example, and I just kind of think like there's an interesting opportunity there to help collaborate and kind of like expand, you know, the understanding of road shields for us certainly just beyond US versus global. So, yeah, with that, thanks for giving us a chance to talk and if there's any questions, we'll be happy to answer them. [applause] >> AUDIENCE MEMBER: I'm curious why you chose to use shape file? Were you using some ESRI software or something? >> Yeah, our map renderer is MapServer, right, and so based on our testing we decided to use shape files as a backing store format because it was the most performant for map server, essentially. Yeah. AUDIENCE MEMBER: Have you have taken advantage of the OSM relations for highway routes? >> Matt: Yeah, I don't think -- we -- yeah, if there's something that could be useful for us, could you you explain like what would be helpful in using those data? >> AUDIENCE MEMBER: So for example, all of interstate 5 within a given state is going to be in a relation. So you know, some of that disambiguation you were talking about that should be handled by that and same way with the state routes, I think down to the county level, stuff, so this whole issue of dissolving should also be made a lot ease y because you know, every time the speed limit change, you get a different OSM way but all of those ways are joined together in a relation. >> Awesome, thanks. What was your criteria in terms of spacing the shields for looks good? >> Ari: Thanks, Matt. I could honestly say those decisions were made by another team, the design team. Which thinks -- spends most of their time thinking about how does other people's data look good in our product, so so was there a criteria? Was it formal? Not today, so I think we probably got close, they stopped calling us, so ... AUDIENCE MEMBER: Hi there. You talked about pruning nodes that have the same name, the same number on them. Did you put any thought into pruning nodes that have different numbers, say, to avoid having clustering around an intersection or a municipality? >> That specifically, no, we didn't. And you know, it is likely that some of the nodes that are missing when we actually see the rendered product are missing because they are taken out by our closure algorithm. We're hoping not too many, because the more that we make the front end do, the slower the service runs, but yeah, we didn't do that particular set, but that would be a great idea for future work. >> Thanks, everybody. [applause] >> Oh, and also, if anybody likes stickers, I have a bunch of CUGA stickers. They picture one of our owners, saving the space needle with a drone ... ... [break]